Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the Idle Micro Benchmark into the openj9-systemtest repo. #13

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tanvikini
Copy link

The Idle Micro Benchmark has been written to test the Idle Tuning feature and its proper working.
This test needs to be committed into the openj9-systemtest repo as a stress test.

Signed-off-by: Tanvi Kini [email protected]

@tanvikini
Copy link
Author

@lumpfish
I am trying to commit a new test into the openj9-systemtest repository, I have a few doubts regarding this process.

  1. The changes I have made to the makefile - I have added the test and directly passed the JVMARGS to the $(STF_COMMAND) there. Is this okay or does it have to be translated to Modes ?
    I am unable to find where the Modes have been defined the in repo. It would be helpful to have your input here.

  2. Two of the tests - test.IdleBenchMark_GcOnIdle and test.IdleBenchMark_CompactOnIdle are meant to be run only on the Linux AMD architecture and not the others.
    How can I exclude these tests from running on other architectures?

The Idle Micro Benchmark has been written to test the Idle Tuning feature and its proper working.
This test needs to be committed into the openj9-systemtest repo as a stress test.

Signed-off-by: Tanvi Kini <[email protected]>
@lumpfish
Copy link
Contributor

lumpfish commented Dec 7, 2017

Regarding the test metadata - modes and platforms etc. - these should be specified in the playlist.xml file for the openj9-systemtests at https://github.com/AdoptOpenJDK/openjdk-tests. As I write this, the playlist.xml file is being created for the first time to enable the tests in this repository to be executed alongside tests from other repositories at https://adoptopenjdk.net/. Once it is there, the "make test-target" command lines for these tests, along with any mode and platform metadata will need to be added to that file.
If these tests are only applicable to certain platforms, I suggest you add a check in the tests themselves to output an error message if someone does run them on a non-applicable platform.
Where the tests themselves require very specific arguments as these tests do, hard coding them in the tests is the right thing to do. In fact, rather than adding them on the makefile command lines as you have done, why not just hard code them into the java command lines in the STF code?

@lumpfish
Copy link
Contributor

lumpfish commented Dec 8, 2017

@tanvikini - thanks for giving me access to your fork.
I'm getting a build error:
git/openj9-systemtest/openj9.test.load/src/test.load/net/openj9/stf/IdleLoadTest.java:73: error: ';' expected
[javac] test.doRunForegroundProcess("Run idle load test", "ILT", Echo.ECHO_ON,
[javac] ^
[javac] 1 error

The Idle Micro Benchmark has been written to test the Idle Tuning feature and its proper working.
This test needs to be committed into the openj9-systemtest repo as a stress test.

Signed-off-by: Tanvi Kini <[email protected]>
@lumpfish
Copy link
Contributor

Regarding hard coding the modes - one approach is to use the STF -test-args=xxxx option to say which subtest to run, and then set the -X args accordingly - e.g.

-test-args="subtest=test1"
-test-args="subtest=test2"

and then in the test automarion
if subtest = "test1" then options = -Xoption1
if subtest = "test2" then options = -Xoption2

See https://github.com/eclipse/openj9-systemtest/blob/master/openj9.test.sharedClasses/src/test.sharedClasses/net/openj9/stf/SharedClasses.java for an example.

@tanvikini tanvikini force-pushed the add_idle_test branch 2 times, most recently from 443d854 to a9ec747 Compare December 22, 2017 10:47
@tanvikini
Copy link
Author

@lumpfish I have made changes to the makefile as well as to IdleLoadTest.java which is the automation file. Sorry for the delay in getting back. I took some time as I needed to make some more changes to the test logic. Please have a look at my changes and let me know if you are happy with them. Thank you!

@tanvikini tanvikini changed the title WIP: Adding the Idle Micro Benchmark into the openj9-systemtest repo. Adding the Idle Micro Benchmark into the openj9-systemtest repo. Jan 10, 2018
import net.adoptopenjdk.stf.runner.modes.HelpTextGenerator;

/**
* This is a test plugin for Java.util related tests, it runs a workload of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem right.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to "This is a test plugin to run the Idle Microbenchmark as a load test which runs workloads from the test.idle project."

@lumpfish
Copy link
Contributor

@tanvikini - I have run the tests from your branch locally and the JVM goes out of memory after what appears to be the third invocation of the tests:
ILT 14:15:31.441 - Completed 16.7%. Number of tests started=6 (+0)
ILT 14:15:51.462 - Completed 22.2%. Number of tests started=8 (+2)
ILT 14:16:11.471 - Completed 25.0%. Number of tests started=9 (+1)
ILT stderr JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2018/01/10 14:16:19 - please wait.
ILT stderr JVMDUMP032I JVM requested System dump using '/tmp/stf/20180110-140428-IdleLoadTest/results/core.20180110.141619.31015.0001.dmp' in response to an event
The java process then hangs.
Do the tests run OK for you locally?

@tanvikini
Copy link
Author

@lumpfish The test runs fine and passes for me when I run a GcOnIdle subtest but I too am seeing an OOM being thrown when I run the CompactOnIdle subtest. The test is running out of heap space. I will look into this. Thanks.

@tanvikini
Copy link
Author

@sabkrish Observations from CompactOnIdle runs :

  • Tried running with 2048m as Xmx in stead of 1024m. The test runs fine and passes.
  • Tried running by passing -XX:+IdleTuningGcOnIdle along with -XX:+IdleTuningCompactOnIdle. The test throws an Out of Memory as mentioned above but runs and passes anyway.

Not sure if increasing the Xmx value is the appropriate work around for this.

@lumpfish
Copy link
Contributor

What happens if the test is run without any of the -XX:Idle arguments - i.e. is it just the nature of the workload which requires the heap or is it an effect off the -XX:Idle implementation?

@tanvikini
Copy link
Author

@lumpfish - the test is designed to consume heap so that the effect of the idle tuning feature can be clearly observed. It is the nature of the test. We still should not be seeing an OOM. I just tried running the test again with -XX:CompactOnIdle and Xmx=1024, I am unable to recreate the OOM at the moment even in two separate runs. I'm unsure if this is because the box I use to run these tests has been recently rebooted. I am going to try cloning my branch and running it on a different machine to see if the same behaviour is observed.
Would you by any chance have the gc.verbose file from the run you ran? I've lost my earlier logs from the failing runs because of the reboot.

@lumpfish
Copy link
Contributor

I can't see a gc.verbose file in the test output, just the regular OOM dumps and jitlog.timestamp file.

@tanvikini
Copy link
Author

Ran the same test on a test machine jsvt153.hursley.ibm.com and could not reproduce the OOM there either. The gc.verbose file should be under /tmp/stf/<date_time_IdleLoadTest>/results.

@lumpfish
Copy link
Contributor

There is no gc.verbose file produced on my test runs.

This is my test command line:
perl /home/user/git/stf/stf.core/scripts/stf.pl -test=IdleLoadTest -test-args="variation=MinIdleWaitTime" -test-root="/home/user/git/openj9-systemtest

When the test runs there is no option to ask for a gc log. These are the Java -X options passed to the java process:
/home/user/java/bin/java -XX:IdleTuningMinIdleWaitTime=180 -Xmx1024m -Xjit:verbose={compilePerformance},vlog=jitlog

Are you running the same code which is checked into your branch?

@tanvikini
Copy link
Author

I was running the compactOnIdle subtest all this time to check for OOM. After seeing your comment, I ran the MinIdleWaitTime and am able to reproduce the OOMs. It's surprising though that I didn't see it in my earlier runs. I validated that the testcase is creating more objects causing the OOM when the MinIdleWaitTime test is run. I can reduce the number of iterations for this test or increase the heap size for this alone. Do you see any issues with increased heap size as the MinIdleWaitTime can run on all platforms including z?

@lumpfish
Copy link
Contributor

Regarding "Do you see any issues with increased heap size.. ?" The issue would be whether changing the heap size affects the test outcome. I had assumed that the test would demonstrate that if these JVM options are in use then heap is returned to the system when it is not in use and that there is an overall improvement in performance. As I have not yet been able to run a test to completion I don't know if the test output demonstrates that. What effect does increasing the heap size have on the test results?

@tanvikini
Copy link
Author

tanvikini commented Jan 16, 2018

There are two parts to Idle detection and Memory management.

  • Idle detection is done by the JIT and it notifies the VM during idle. GC just looks for the notification. JIT notification is enabled when the -XX:IdleTuningMinIdleWaitTime option is passed.
  • Memory management is handled by GC. When it receives the notification, it either releases the free pages from memory (controlled by GcOnIdle option) or compacts the heap (controlled by -XX:IdleTuningCompactOnIdle option).

You will see the memory usage improvement when -XX:IdleTuningGcOnIdle option is enabled.

All the three options are translated into 3 subtests :

  1. MinIdleWaitTime - This is to test if the JIT notification is i.e the -XX:IdleTuning:MinIdleWaitTime option is passed, we don't regress. The only passing/failing criteria here is that there is no crash.

  2. GcOnIdle - This test tests the -XX:IdleTuningGcOnIdle option which performs a GC and actually releases the free pages during an Idle period.

  3. CompactOnIdle - This test tests the -XX:IdleTuningCompactOnIdle option which performs compactions during and idle period based on certain conditions such as the
    amount of tenure memory that is free and the degree to which the heap is fragmented.

The test itself creates a large number of big objects to consume heap. The first subtest will not be affected by an increase in the heap size and will still run as intended as there is no reduction in the heap size when the -XX:IdleTuningGcOnIdle option isn't passed.

@lumpfish
Copy link
Contributor

So to accommodate running on zOS 31 bit (max heap 1Gb) what are the options?

@sabkrish
Copy link
Contributor

Only the "MinIdleWaitTime" is supported on all platforms. Currently, LoadTest specification is started with 3 threads and number of tests as 36. (i.e) We run for 12 iterations. The number of iterations can be reduced.

@tanvikini
Copy link
Author

Tried running the MinIdleWaitTime subtest with 6 iterations in stead of 12, the rest runs fine and passes without throwing an OOM.

@lumpfish
Copy link
Contributor

lumpfish commented Feb 5, 2018

The new tests should be added to the TEST_TARGETS macro in openj9/makefile so that they will appear in the 'make help' output.

@tanvikini
Copy link
Author

@lumpfish replying to your previous comment, I have made changes to the makefile under openj9.build and this is the output I see

tanvi@j9x3650m4:~/open-hack/git/openj9-systemtest/openj9.build$ make help
makefile:106: OPENJ9_SYSTEMTEST_ROOT is /home/tanvi/open-hack/git/openj9-systemtest
makefile:144: STF_ROOT is /home/tanvi/open-hack/git/stf
makefile:170: OPENJDK_SYSTEMTEST_ROOT is /home/tanvi/open-hack/git/openjdk-systemtest
makefile:178: PREREQS_ROOT is /home/tanvi/open-hack/git/openj9-systemtest/../../systemtest_prereqs
makefile:208: RESOLVED_PREREQS_ROOT is /home/tanvi/open-hack/git/openj9-systemtest/../../systemtest_prereqs
makefile:336: JAVA_HOME is /home/tanvi/tanvi/builds/80sr5/july31st/ibm-java-x86_64-80
makefile:339: /home/tanvi/tanvi/builds/80sr5/july31st/ibm-java-x86_64-80/bin/java -fullversion returned
makefile:340: java full version JRE 1.8.0 IBM Linux build 8.0.5.0 - pxa6480sr5-20170731_01(SR5 Beta-1)
make or make build: Builds openj9-systemtest projects
make test: Runs all openj9-systemtest tests
make test.list test.help test.DaaLoadTest_daa1 test.DaaLoadTest_daa2 test.DaaLoadTest_daa3 test.DaaLoadTest_daaAll test.HeapHogLoadTest test.ObjectTreeLoadTest test.SharedClassesWorkload test.SharedClassesAPI test.SharedClassesWorkloadTest_Softmx_Increase test.SharedClassesWorkloadTest_Softmx_IncreaseDecrease test.SharedClassesWorkloadTest_Softmx_Increase_JitAot test.SharedClasses.SCM23.SingleCL test.SharedClasses.SCM23.MultiCL test.SharedClasses.SCM23.MultiThread test.SharedClasses.SCM23.MultiThreadMultiCL test.IdleBenchMark_MinIdleWaitTime test.IdleBenchMark_GcOnIdle test.IdleBenchMark_CompactOnIdle: Runs all openj9-systemtest tests
make test.xxxx: Runs individual test xxxx

Is this what is expected?

@tanvikini
Copy link
Author

Also, about the OOM observed in the MinIdleWaitTime test, I have not been able to reproduce it yet. I am trying to work with Sabari and Param to figure out what is wrong. Meanwhile, would it be reasonable to ask that the test be added to the repo while opening a separate issue to continue investigating the OOM for this test?

There are 2 other tests - GcOnIdle and CompactOnIdle. Perhaps you could try running one of them to see if they run to completion for you and if you are happy with them? This way we can at least get the working tests committed and deal with the failing test through an issue.
Please let me know what you think.

@lumpfish
Copy link
Contributor

I have done the following:

  1. Used this java:
    java version "1.8.0_171"
    Java(TM) SE Runtime Environment (build 8.0.6.0 - pxa6480sr6-20180316_01(SR6))
    IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64 Compressed References 20180315_381333 (JIT enabled, AOT enabled)
    OpenJ9 - 48fb9a1
    OMR - 3dfc56d
    IBM - 4453dac)
    JCL - 20180308_02 based on Oracle jdk8u171-b09

  2. git clone [email protected]:tanvikini/openj9-systemtest

  3. git checkout add_idle_test

  4. cd openj9.build

  5. make

  6. ~/git/openj9-systemtest/openj9.build$ perl /git/stf/stf.core/scripts/stf.pl -test=IdleLoadTest -test-args="variation=MinIdleWaitTime" -test-root="/git/openj9-systemtest"

  • fails with OOM at 16.7% complete
  1. same command but with -test-args="variation=GcOnIdle"
  • fails with OOM at 8.3% complete
  1. same command but with -test-args="variation=CompactOnIdle"
  • fails with OOM at 8.3% complete

Am I running the correct code from your fork?

@tanvikini
Copy link
Author

@lumpfish While going through the logs of the failing tests that you provided we see that the test is failing right at the start after initializing the 3 different workload types - transactional, image and spike. We are suspecting that the reason might be the spike workload. I have commented it out from the inventory file and reduced the number of parallel threads to be run to 2. Would you try to run the test with the latest changes and let me know if you still see the failure?

Meanwhile, I will be running the tests on a virtual box on my laptop with the same configuration as yours (4 cpus and 3 GB RAM) to see if I can replicate the error. I will keep this space updated with my findings.

@tanvikini tanvikini force-pushed the add_idle_test branch 2 times, most recently from c01681b to c87b315 Compare April 17, 2018 12:49
@tanvikini
Copy link
Author

I was able to recreate the OOM by setting up a Virtual Machine on my personal laptop with 3GB ram and 2 CPUs. I was able to fix the issue for myself by halving the number of threads for each of the workloads.
From 15 to 8 for the Image workload.
From 10 to 5 for the transactional workload and
From 5 to 3 for the spike workload.

This works for me for all 3 tests on my VM setup and I am not seeing the issue anymore. The conflict with the playlist.xml file also seems to be resolved now.
@lumpfish would you try running the test and let me know if my current changes have resolved the issue for you as well?

@lumpfish
Copy link
Contributor

With the reduction in the values of the test parameters, are the tests still meeting their goals?

@tanvikini
Copy link
Author

We were initially using this test to uncover synchronization issues which it did. These parameters are fine for that purpose. Also, since we are running all 3 workloads in parallel, I believe the goal of stressing the JVM is still being met.

TEST_TARGETS:=test.list \
test.help \
$(DAA_TESTS) \
$(GC_TESTS) \
$(SHARED_CLASSES_TESTS) \
$(IDLE_TESTS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a backslash on this line

@lumpfish
Copy link
Contributor

The tests have now run successfully on my local machine.
I needed to make a local change to work around #27
The tests take quite a while to run on my laptop (30 - 60 mins), so @Mesbah-Alam please be aware of this when adding the tests to any build - test pipelines.

@karianna
Copy link
Contributor

karianna commented Oct 2, 2024

@tanvikini needs a rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants